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Abstract: The remote sensing image archive is increasing day by day. The storage, organi^sBjf and 
retrieval of these images poses a challenge to the scientific community. In this ]^9|^^e have 
developed a system for retrieval of remote sensing images on the basis of color momacwfl gray level 
e matrix feature extractor. The results obtained through prototype systerfcis encouraging. 
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I. Introduction VA"N ♦ 

Content-based image retrieval (CBIR) technology was proposed %^^dtT and it is an image retrieval 
technology using image vision contents such as color, texture, shap^sWnial relationship, not using image 
notation to search images. It resolves some traditional ima^p^£meval problems, for example, manual 
notations for images bring users a large amount of world oaw^nd inaccurate subjective description. 
After more than one decade, it has been developed as ♦jXifent-based vision information retrieval 
technology including image information and video irlrcJt^ation. Great progress has been made in theory 
and applications. 

At present, CBIR technology obtains successffHi^pplications in face reorganization fields, fingerprint 
reorganization fields, medical image datable fields, trademark registration fields, etc., such as QBIC 
system of IBM Corporation, Photobook aEr^n of MIT Media Laboratory and Virage system of Virage 
Corporation. It is difficult to apply tli/!eyfystems in massive remote sensing image archive because 
remote sensing image has manyffaBntws including various data types, a mass of data, different 
resolution scales and different dax^^ources, which restrict the application of CBIR technology in 
remote sensing image field. InJOTder to change the current situation, we must resolve some problems a 
follows. 

1. Storing massiverfcnjte sensing image data. 

2. Designing re*s#mable physical and logical pattern of remote sensing image database. 
Adopting anappre image feature extraction algorithms. 
Adoptip^ifflpdng structure for search. 

^reasonable content based searching system of massive remote sensing image 
date " ~ 

f the paper is arranged as follows. In Sec. 2, we discuss the methodology. In Sec. 3, the 
e^oeriViental setup and the results obtained are discussed. We conclude in Sec. 4. 

II. Methodology 

For practical applications, users are often interested in the partial region or targets, such as military 
target, public targets and ground resource targets in remote sensing image instead of the entire image. For 
example, the small scale important targets and regions of remote sensing image arrest more attention than 
the entire remote sensing image in application. These image slice features of important targets and regions 
extracted by color, texture, shape, spatial relationship, etc. are stored in feature database. Efficient indexing 
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technology is a key factor for applying the content-based image retrieval in massive image database 
successfully. Indexing technology developed from traditional database and has been applied in content- 
based image retrieval field subsequently. Fig.i shows an architecture frame of content-based remote 
sensing image. 

Traditionally, satellite image classification has been done at the pixel level. For a typical LISS III image has 
23.5m resolution, a 100 x 100 sized image patch covers roughly 7.2 Km 2 . This is too large an area to 
represent precise ground segmentation, but our focus is more on building a querying and browsing 
system than showing exact boundaries between classes. Dividing the image into rectangular palates 
makes it very convenient for training as well as browsing. Since users of such systems are gaAalT 
more interested in getting an overview of the location, zooming and panning is allowed optionally £ 
part of the interface. 




Figure 1: ArfrSfreVtural Framework of CBIR system 

We have developed a prototype s^fpm for image retrieval. In this a query image is taken and images 
similar to the query images are#«ttind on the basis of color and texture similarity. The three main taste of 
the system are: S^^J 
>& 

1. Color Moment Ffctye Extraction 

2. GLCM TexturjTtafTIre Extraction. 

3. K-means cJfflfcpmg to form index. 

4. Retrievarwtareen the query image and database. 

2.1 Color Moment 

color channel at the j th image pixel as pij. The three color moments can then be 



qkefine the i' 



j^ne^ as: 

Moment 1 - Mean: 



Mean can be understood as the average color value in the image. 
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Moment 2 -Standard Deviation: 



The standard deviation is the square root of the variance of the distribution. 
Moment 3 - Skewness: 

— , ^ 

Skewness can be understood as a measure of the degree of asymmetry in the distribution** ^ 

2.2 Grey-Level Co-Occurrence Matrix Texture 

Grey-Level Co-occurrence Matrix texture measurements have been the worKboVBe of image texture since 
they were proposed by Haralick in the 1970s. To many image analysts^jMy a^r a button you push in the 
software that yields a band whose use improves classification - or nof Jtaejbriginal works are necessarily 
condensed and mathematical, making the process difficult to uh^^sind for the student or front-line 
image analyst. Cj* 

(?) 

Calculate the selected Feature. This calculation uses only the \elueSTn the GLCM. See: 



Contrast 
Correlation 
Energy 
Homogeneity 



*fr yip, ' 



Features are calculated with distance 1 and angle o, 45 and 90 degrees. 

2.3 K-Means Clustering 

A cluster is a collection of data objects that are similar to one another with in the same cluster and are 
dissimilar to the objects in the other clusters. It is the best suited for data mining because of its 
efficiency in processing large data sets. It is defined as follows: 

The k-means algorithm is built upon four basic operations: 
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1. Selection of the initial k-means for k-clusters. 

2. Calculation of the dissimilarity between an object and the mean of a cluster. 

3. Allocation of an object of the cluster whose mean is nearest to the object. 

4. Re-calculation of the mean of a cluster from the object allocated to it so that the intra cluster 
dissimilarity is minimized. 

The advantage of K-means algorithm is that it works well when clusters are not well separated from each 
other, which is frequently encountered in images. The cluster number allotted to each image is 
considered its class or group. 

Many similarity measures have been developed for image retrieval based on empirical es^mfles of the 
feature extraction. We have used Euclidean Distance for similarity matching. . 

The Euclidean distance between two points P = (p! p 2 pn) andQ = (q i q, q n ) ; i n Eu^^^n n-space defined 

Now for the retrieval purpose the user select the query patch and on the dSbks of its class number the 
distance between the query patch with the other images of th^fN^^ fs calculated and images are 



HI. Experimental ^^j^ 



For our experiments, we use 3 LISS III + multi-spectral safcjltit# images with 23.5m resolution. We choose 
to support 4 semantic categories in our experimental systctff^amely mountain, water bodies, vegetation, and 
residential area. In consultation with an expert in satejlra^rnage analysis, we choose near-IR (infra-red), red 
and green bands as the three spectral channels iiAcEBsification as well as display. The reasons for this 
choice are as follows. Near-IR band is selec^^over blue band because of a somewhat inverse 
relationship between a healthy plant's reflectivity in near-IR and red, i.e., healthy vegetation reflects high in 
near-IR and low in red. Near-IR and r^N^mds are key to differentiating between vegetation types 
and states. Blue light is very abundajrf^tfV the atmosphere and is diffracted all over the place. It 
therefore is very noisy. Hence us#o^OTifcr band is often avoided. Visible green is used because it is less 
noisy and provides unique informc^oV compared to Near IR and red. The pixel dimensions of each 
satellite image are used in our^Stoeriments are 720x540, with geographic dimensions being approximately 
51.84Kmx38.88Km. The choiaj^m size is critical. A patch should be large enough to encapsulate the 
visual features of a sema\llg£jtegory, while being small enough to include only one semantic category 
in most cases. We choofcjetch size 100x100 pixels. We obtain 80 patches from all the images in this 
manner. These patche^N^sxored in a database along with the identity of their parent images and the relative 
location within the/TV»ound truth categorization is not available readily for our patches. 

The four mai^c\ssifications of images are shown in figure 2 to 5. Figure 6 and 7 shows the content based 
retrieval syatgjiSWe get 80% to 83% accuracy in our results. 



<5 




Figure 2: Water bodies 
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Figure 5: Vegetation and Mountain 
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IV. Conclusions 

images to a given query image we have developed a prototype system. We get fruitful 
ample images used in the experiments. We can use this technique for mining similar 
on content and knowledge base for finding vegetation or water or building areas. 
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